Skip to content

Move runtime infrastructure out of low user-VA#23

Merged
jserv merged 1 commit into
mainfrom
elf
May 10, 2026
Merged

Move runtime infrastructure out of low user-VA#23
jserv merged 1 commit into
mainfrom
elf

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 10, 2026

Page-table pool and EL1 shim previously sat at fixed low addresses [0x10000, 0x400000), colliding with low-linked ET_EXECs. Android linker64 binaries link at 0x200000 and the loader accepted them, but sys_{mprotect,munmap,mmap} MAP_FIXED, and rt_sigreturn then rejected any operation on the overlapping pages with a bare EINVAL as soon as the binary tried to RELRO its data segment.

Relocate the page-table pool, shim code, and shim data into a 4 MiB reserve placed just below g->interp_base, in the dead zone between g->mmap_limit and g->interp_base. PT_POOL_BASE, SHIM_BASE, and SHIM_DATA_BASE become runtime guest_t fields computed by compute_infra_layout from guest_size; for 36-bit IPA the reserve sits at [60 GiB - 4 MiB, 60 GiB), for 40-bit IPA at [1020 GiB - 4 MiB, 1020 GiB). Two helpers guest_range_hits_infra and guest_addr_in_infra retarget the four infra guards at the new range without weakening their security intent. The 64 KiB null-guard slot at the bottom of the reserve is covered too so guest mmap state cannot semantically reserve it either.

Bump fork IPC to v9 to carry elf_load_min so nested forks from low-linked ET_EXECs see the actual load address rather than the legacy ELF_DEFAULT_BASE constant. Validate hdr.ipa_bits, hdr.guest_size, and the page-aligned in-pool location of hdr.pt_pool_next and hdr.ttbr0 in the child path before any size-derived arithmetic so a malformed header cannot underflow interp_base or misalign the page-table walker.

Plumb guest_t through thread_alloc_sp_el1 and record the slot index in thread_entry_t so thread_free_sp_el1_locked can clear the bitmap from teardown contexts (thread_{deactivate,destroy_all_vcpus,ptrace_wait) that lack a guest_t reference.

Add tests/test-fork-lowbase.c, a static ET_EXEC linked at 0x200000 that exercises a nested fork. The grandchild only completes when intermediate child preserved elf_load_min across the IPC handoff.


Summary by cubic

Moves the page‑table pool and EL1 shim out of low user VA to avoid collisions with low‑linked ET_EXEC binaries. Places them in a 4 MiB reserve just below interp_base and preserves the true ELF load base across fork (IPC v9) to fix nested forks.

  • Bug Fixes

    • Relocated runtime infra to a computed high‑IPA reserve; pt_pool_base, shim_base, and shim_data_base are now per‑guest.
    • Added guest_range_hits_infra and guest_addr_in_infra to block mmap(MAP_FIXED), munmap, mprotect, and rt_sigreturn from touching infra memory.
    • Fork IPC v9 carries elf_load_min; child validates ipa_bits, guest_size, and page‑aligned in‑pool pt_pool_next/ttbr0. Adds test-fork-lowbase (non‑PIE at 0x200000) to verify nested fork behavior.
  • Refactors

    • Track elf_load_min and use it when snapshotting ELF+brk for fork.
    • thread_alloc_sp_el1(g, t) now takes guest_t and records the slot index so teardown can free without a guest_t.
    • Boot/exec/reset paths load and account shim regions via g->shim_base/g->shim_data_base; icache invalidation and used‑region reporting updated accordingly.

Written for commit 40a759e. Summary will update on new commits.

Page-table pool and EL1 shim previously sat at fixed low addresses
[0x10000, 0x400000), colliding with low-linked ET_EXECs. Android
linker64 binaries link at 0x200000 and the loader accepted them, but
sys_{mprotect,munmap,mmap} MAP_FIXED, and rt_sigreturn then rejected any
operation on the overlapping pages with a bare EINVAL as soon as the
binary tried to RELRO its data segment.

Relocate the page-table pool, shim code, and shim data into a 4 MiB
reserve placed just below g->interp_base, in the dead zone between
g->mmap_limit and g->interp_base. PT_POOL_BASE, SHIM_BASE, and
SHIM_DATA_BASE become runtime guest_t fields computed by compute_infra_layout
from guest_size; for 36-bit IPA the reserve sits at [60 GiB - 4 MiB,
60 GiB), for 40-bit IPA at [1020 GiB - 4 MiB, 1020 GiB). Two helpers
guest_range_hits_infra and guest_addr_in_infra retarget the four infra
guards at the new range without weakening their security intent. The 64
KiB null-guard slot at the bottom of the reserve is covered too so guest
mmap state cannot semantically reserve it either.

Bump fork IPC to v9 to carry elf_load_min so nested forks from low-linked
ET_EXECs see the actual load address rather than the legacy ELF_DEFAULT_BASE
constant. Validate hdr.ipa_bits, hdr.guest_size, and the page-aligned
in-pool location of hdr.pt_pool_next and hdr.ttbr0 in the child path
before any size-derived arithmetic so a malformed header cannot underflow
interp_base or misalign the page-table walker.

Plumb guest_t through thread_alloc_sp_el1 and record the slot index in
thread_entry_t so thread_free_sp_el1_locked can clear the bitmap from
teardown contexts (thread_{deactivate,destroy_all_vcpus,ptrace_wait) that
lack a guest_t reference.

Add tests/test-fork-lowbase.c, a static ET_EXEC linked at 0x200000 that
exercises a nested fork. The grandchild only completes when intermediate
child preserved elf_load_min across the IPC handoff.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 14 files

@jserv jserv merged commit 059fb2b into main May 10, 2026
5 checks passed
@jserv jserv deleted the elf branch May 10, 2026 15:50
@jserv jserv mentioned this pull request May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant